AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multimodal feature extraction

# Multimodal feature extraction

CLIP ViT L Rho50 K1 Constrained FARE2
MIT
A feature extraction model fine-tuned based on openai/clip-vit-large-patch14, optimizing the image and text encoders
Multimodal Fusion Transformers
C
LEAF-CLIP
253
0
Moonvit SO 400M
MIT
MoonViT is a native resolution visual encoder, initialized and continuously pre-trained based on SigLIP-SO-400M, suitable for image feature extraction tasks.
Image Enhancement Transformers
M
moonshotai
275
12
Vit Large Patch14 Clip 224.dfn2b
Other
A vision transformer model based on the CLIP architecture, focused on image feature extraction, released by Apple.
Image Classification Transformers
V
timm
178
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase